GREEK ToBI: A System for the Annotation of Greek Speech Corpora
نویسندگان
چکیده
Greek ToBI is a system for the annotation of (Standard) Greek spoken corpora, that encodes intonational, prosodic and phonetic information. It is used to develop a large and publicly available database of prosodically annotated utterances for research, engineering and educational purposes. Greek ToBI is based on the system developed for American English (ToBI), but includes novel features (“tiers”) designed to address particularities of Greek prosody that merit annotation, such as stress and juncture. Thus Greek ToBI includes five tiers: the Tone Tier shows the intonational analysis of the utterance; the Prosodic Words Tier is a phonetic transcription; the Break Index Tier shows indices of cohesion; the Words Tier gives the text in romanization; the Miscellaneous Tier is used to encode other relevant information (e.g., disfluency or pitch-halving). The development of GRToBI is largely based on the transcription and analysis of a corpus of spoken Greek, that includes data from several speakers and speech styles, but also draws on existing quantitative research on Greek prosody.
منابع مشابه
An Autosegmental-Metrical Analysis and Prosodic Annotation Conventions for Cantonese
This paper introduces the C_ToBI (Cantonese Tones and Break Indices) conventions formodern Cantonese. These conventions, developed within the Autosegmental-Metricalapproach of the ToBI framework, are designed for use in annotating and exploring tone andjuncture phenomena in spoken Cantonese corpora. Tone and juncture phenomena ofespecial interest for prosodic typology includ...
متن کاملA CART approach for Duration Modeling of Greek Phonemes
This paper describes the construction and evaluation of a segmental duration prediction model for Greek language with the application of CART (Classification and Regression Tree) machine learning approach. A ToBI annotated prosodic speech corpus was utilized for the construction of training and testing sets. Our phoneme category was composed of 34 phonemes distributed in 32.072 instances (in 5....
متن کاملA Real-World Emotional Speech Corpus for Modern Greek
The present paper deals with the design and the annotation of a Greek real-world emotional speech corpus. The speech data consist of recordings collected during the interaction of naïve users with a smart-home dialogue system. Annotation of the speech data with respect to the uttered command and emotional state was performed. Initial experimentations towards recognizing negative emotional state...
متن کاملSpeech annotation and corpus tools
The growth in the use of speech corpora has benefited in the last 10 years from the establishment of data centres, such as the Linguistic Data Consortium (LDC), the European Language Resources Association (ELRA), the Japanese Language Resource Consortium (GSK: Gengo Shigen Kyouyuukikou), and multi-site annotation initiatives, such as the ToBI system for prosodic annotation and the DAMSL system ...
متن کاملA comparison of inter-transcriber reliability for two systems of prosodic annotation: rap (rhythm and pitch) and toBI (tones and break indices)
Agreement was investigated among five labelers for the use of two prosodic annotation systems: the ToBI (Tones and Break Indices) system [1,2] and the RaP (Rhythm and Pitch) system [3]. Each system permits the labeling of pitch accents and two levels of phrasal boundaries; RaP also permits labeling of speech rhythm and distinguishes multiple levels of prominence on syllables. After training wit...
متن کامل